Exploring energy demand data in light of recent policy changes
Author
Affiliation
Colleen McCamy
MEDS
Published
December 3, 2022
the question
Did the Time-of-Use electricity rate transition have an effect on peak energy demand in the greater San Diego region?
introduction
California has ambitious clean energy and decarbonization goals. 1 To achieve these goals, California will need to increase its current electricity grid capacity by about three times. 2 Time-of-use electricity rates are a value strategy to help reduce the investment needed for expanding grid capacity and can help maximize the use of renewable resources. 3
The Time-of-Use (TOU) energy rate referenced in this analysis establishes lower electricity prices for times when there is more renewable energy supply available and helps to encourage electricity use when generation is cleanest and lowest-cost. Additionally, energy rates are higher during the early evening peak to help promote less energy use during times when renewable energy supply decreases and grid operators need to ramp up generation from fossil-fuel based power plants. For the default residential rate, the higher cost hours or “peak hours” are from 4:00 - 9:00 p.m. 4
While many utilities have piloted time-varying rates, the recent California TOU transition was the biggest test of time-based rates yet - automatically switching over 20 million electricity consumers to a TOU rate. 5 While there has been some initial research and analysis on time-based rates in electricity markets, there are limited results on mass-market transitions and how time-based rates affect total electricity consumption.
Answering the question, ‘did the Time-of-Use electricity rate transition have an effect on peak energy demand in the San Diego region?,’ can help provide insight to spur further investigations on time-based rates throughout the state of California and beyond.
the data
Energy Demand Data
Energy demand data used in this analysis are publicly available and provided by the US Energy Information Administration. The data were downloaded via the API dashboard 6 and were selected to include the time frame of July 1, 2018 to July 31, 2022, hourly demand by subregion in megawatt hours (MWh) in the local time zone (Pacific), and the San Diego Gas and Electric (SDGE) subregion. SDGE serves 3.7 million people through 1.5 million electric meters covering 4,100 square miles in San Diego and southern Orange counties. The energy demand data are an aggregate electricity demand from all customers throughout SDGE’s service territory.7
checkout the code
#loading the necessary librarieslibrary(dplyr)library(tidyverse)library(here)library(readr)library(gt)library(tufte)library(feasts)library(janitor)library(lubridate)library(broom)library(tsibble)library(ggpubr)library(ggiraph)library(ggiraphExtra)library(sjPlot)library(ggcorrplot)library(car)library(modelr)# setting my root directoryrootdir <- ("/Users/colleenmccamy/Documents/MEDS/EDS_222_Stats/final_project")# reading in the dataeia_data_raw <-read_csv(paste0(rootdir, "/data/eia_data.csv"))# cleaning the data to be the two variables of interesteia_df <- eia_data_raw |>select(date, hourly_energy_mwh) |>na.omit()# creating a time series dataframeeia_ts <- eia_df |>as_tsibble()
Temperature Data
In California, peak electricity demand and temperature is highly correlated. 8 As this investigation looks into energy demand, temperature data was added to the analysis. The temperature data used in the following analysis are publicly available through the NOWData Online Weather Data portal from the National Weather Service, a branch of the National Oceanic and Atmospheric Administration. 9 The temperature data includes an average of daily maximum, minimum, and average temperature from numerous weather stations throughout San Diego County in Fahrenheit. This analysis uses the maximum daily temperature for the same temporal scale as the energy demand data.
Since the temperature data is an aggregate of multiple stations throughout San Diego County, this can cause bias as the SDGE service territory covers multiple different temperate regions which may not be accurately represented within the average of the stations. Also, the weather stations are more heavily concentrated towards the coast. This could cause bias in the temperature maximum temperatures.
checkout the code
# loading in the temperature datatemp_data <-read_csv(paste0(rootdir, "/data/sd_temp_data.csv"))# wrangling the datatemp_data <- temp_data |>mutate(temp_max =as.numeric(temp_max)) |>mutate(temp_min =as.numeric(temp_min)) |>mutate(temp_avg =as.numeric(temp_avg)) |>mutate(temp_dept =as.numeric(temp_dept)) |>mutate(date = lubridate::mdy(Date)) |>select(!Date)# restructuring the eia data to merge the dataset with the temperature data by dateeia_data <- eia_df |>mutate(time = (date)) |>mutate(date =as.Date(date))eia_data$time <-format(eia_data$time, format ="%H:%M:%S")# merging the data into one dataframeenergy_temp_df <-left_join(x = eia_data,y = temp_data,by ="date")
Exploratory Data Visualizations
The following figures outline the sum of daily energy demand during peak hours and the daily max temperature.
checkout the code
# exploring the data by plotting energy demand throughout timeenergy_demand_plot <-ggplot(data = eia_df,aes(x = date, y = hourly_energy_mwh)) +geom_line(col ="#b52b8c") +labs(title ="Hourly Energy Demand (MWh)",x ="Date",y ="MWh") +theme_minimal() +theme(plot.title =element_text(hjust =0.5))# exploring the data by plotting maximum temperature throughout timemax_temp_plot <-ggplot(temp_data, aes(x = date, y = temp_max)) +geom_line(col ="#52796f") +labs(title ="Maximum Temperature per day (°F)",x ="Date",y ="Max Temperature (°F)") +theme_minimal() +theme(plot.title =element_text(hjust =0.5))# creating dataframe for tou peak horustou_peak_hours_df <- energy_temp_df |>filter(time >=16& time <=21)# grouping it for daily peak hours to plot with daily maximum temperaturedaily_peak_hrs_df <- tou_peak_hours_df |>group_by(date) |>summarize(daily_energy_mwh =sum(hourly_energy_mwh))# plotting daily peak energy demand with daily max temperaturespeak_demand_plot <-ggplot(data = daily_peak_hrs_df,aes(x = date, y = daily_energy_mwh)) +geom_line(col ="#b52b8c") +labs(title ="Hourly Energy Demand (MWh)",x ="Date",y ="MWh") +theme_minimal() +theme(plot.title =element_text(hjust =0.5))peak_demand_plot
checkout the code
# plotting along with daily temperatureggarrange(peak_demand_plot, max_temp_plot,ncol =2, nrow =1)
checkout the code
# restructuring the eia data to merge the dataset with the temperature data by dateeia_data <- eia_df |>mutate(time = (date)) |>mutate(date =as.Date(date))eia_data$time <-format(eia_data$time, format ="%H:%M:%S")# merging the data into one dataframeenergy_temp_df <-left_join(x = eia_data,y = temp_data,by ="date")
analysis
A multi-linear regression and time series decomposition analysis can help answer the question at hand. Prior to conducting the linear model, I also used summary statistics to a cutoff point for a dichotomous variable.
linear model
To investigate if the implementation of the TOU policy had an effect on energy demand, I used a multiple linear regression model. However, since other factors also have an effect on electricity demand, I added the temperature and hour of the day on hourly electricity demand. The equation for this model is: \[hwy_i =\beta_{0}+\beta_{1} \cdot TOUPolicy_i +\beta_{2} \cdot \text HotDay_i+ \beta_{3} \cdot \text PeakHour_i +\varepsilon_i\]
The ‘TOUPolicy’ predictor (‘tou_policy’ in the results) is a dichotomous variable indicating if the TOU Policy was in effect or not. The ‘PeakHour’ predictor is also a dichotomous variable which indicates whether or not the hour of the day was during peak times from 4:00 - 9:00 p.m.
Lastly, the ‘HotDay’ variable (‘hot_day’ in the results) is a dichotomous variable indicating if the maximum temperature for the San Diego region was equal to or greater than 80 (°F) or below 80 (°F). This cutoff temperature was determined by looking at the mean and standard deviation of the maximum temperature in San Diego during the time of interest. Outlined below in the boxplot, the average maximum temperature was about 72 (°F) and the standard deviation about 7 (°F). thus, 80 (°F) was determined to be a ‘hot day’ in looking at the effect of temperature and hourly electricity demand.
checkout the code
### ---- Determining a "Hot Day" ---- # determining the mean and standard deviation for the time period of interestmean_max_temp <-mean(energy_temp_df$temp_max, na.rm =TRUE)sd_max_temp <-sd(energy_temp_df$temp_max, na.rm =TRUE)print(mean_max_temp)print(sd_max_temp)# preparing the data to plotbox_data <-as_tibble(energy_temp_df$temp_max)# plotting the mean and standard deviationtemp_box <-ggplot(box_data) +geom_boxplot(aes(x = value)) +labs(x ="Maximum Daily Temperature (°F)") +theme_minimal()temp_box
checkout the code
### ---- Adding a 'Hot Day' Indicator in the Dataframe ---- temp_demand_daily <- energy_temp_df |>group_by(date) |>summarize(daily_energy_mwh =sum(hourly_energy_mwh)) |>left_join(temp_data, by ="date") |>mutate(hot_day =case_when( (temp_max >=80) ~1, (temp_max <=79) ~0))### ----- Adding TOU Policy and Peak Hours to Dataframe -----# adding a year separate year column in the dataframeenergy_temp_df <- energy_temp_df |>mutate(year = date)energy_temp_df$year <-format(energy_temp_df$year, format ="%Y") # using variables to create dichotomous predictorsenergy_temp_df <- energy_temp_df |>mutate(tou_policy =case_when( (year >2020) ~1, (year <=2020) ~0)) |>mutate(time =as_datetime(time, format ="%H:%M:%S")) |>mutate(time = lubridate::hour(time)) |>mutate(tou_policy =case_when( (year >2020) ~1, (year <=2020) ~0)) |>mutate(peak_hours =case_when( (time <16) ~0, (time >=16& time <=21 ) ~1, (time >21) ~0)) |>mutate(hot_day =case_when( (temp_max >=80) ~1, (temp_max <=79) ~0))#### ----- Linear Regression on Hourly Energy Demand ---- ###model_tou_peak_demand <-lm(formula = hourly_energy_mwh ~ tou_policy + peak_hours + hot_day, data = energy_temp_df)
time series analysis
To dive furthering into additional affects on energy demand, I conducted a classical decomposition analysis to look into other factors influence hourly electricity demand, such as seasonality or overall energy demand trends.
checkout the code
x =seq(from =ymd('2018-07-1'), length.out =1481,by='day')# preparing the dataframe for the time seriesdecom_df <- energy_temp_df |>group_by(date) |>summarize(daily_energy_mwh =sum(hourly_energy_mwh)) |>mutate(index = x)decom_ts <-as_tsibble(decom_df, index = index)decom_plot_annual <-model(decom_ts, classical_decomposition(daily_energy_mwh ~season(365), type ="additive")) |>components() |>autoplot(col ="#3d405b") +theme_minimal() +labs(title ="Classical Decomposition Model",subtitle ="Seasonality defined as 365 days",x ="Date",caption ="Figure 3")decom_plot_monthly <-model(decom_ts, classical_decomposition(daily_energy_mwh ~season(30), type ="additive")) |>components() |>autoplot(col ="#3d405b") +theme_minimal() +labs(title ="Classical Decomposition Model",subtitle ="Seasonality defined as 30 days",x ="Date",caption ="Figure 2")
results
Multiple Linear Regression:
We can interpret all of the parameters used in the regression are significant predictors for hourly electricity demand at a significance level of 0.001 as they all had a p-value of 2 x e-16. The model indicates that when daily maximum temperature is below 80 °F, for non-peak hours and prior to the Time-of-Use implementation in 2020, the average hourly electricity demand is about 2,164 MWh for the SDGE’s service territory. In addition, we expect to see on average a decrease in hourly electricity demand by about 108 MWh for years after the Time-of-Use energy policy was implemented, holding all other predictors constant. For days in which the maximum temperature above 80 °F, the model predicts that the average hourly electricity demand increases by about 409 MWh holding all other predictors constant.
Interestingly, the model predicts that the average hourly electricity demand decreases by about 360 MWh holding all other predictors constant. At first thought, we may expect too see hourly electricity demand to increase during peak times as these are times in which the time-of-use electricity highlights as times with high demands. However, in this analysis we didn’t look at the amount of renewable electricity available on the grid. Thus, overall energy demand may be lower during peak times but it is possible the percent of average hourly electricity demand in relation to hourly renewable electricity available on the grid may be higher during peak times than non-peak times.
Table 1 highlights the estimates, p-value and confidence interval for each of the predictors and intercept and the following equation for the linear regression model is:
This figure above graphs the magnitude of the decrease for the TOU policy for each of the outcomes of the other predictors.
Time Series Analysis - Classical Decomposition:
To better understand possible seasonal and overall trends in hourly electricity demand, we can look at a classical decomposition graphs for our time series data for both yearly and monthly seasonality (Figure 2 and 3).
Looking at the graphs there doesn’t appear to be evidence of a long-run trend in hourly energy demand over time period analyzed as the trend seems to be mostly constant when seasonality is defined as 30 days and 365 days. It also appears that seasonality may be important in driving overall variation in electricity demand when seasonality is defined as 365 days since the gray bar is closer in scale to the overall time series graph. Anecdotaly , this is intuitive as we can predict that the variance in hourly electricity could be affected by the month. Since month of the year and temperature are it is logical how seasonality based on month affects energy demand given the known relationship on temperature and energy. However, when seasonality is defined as 30 days, the seasonal effect appears to be not as important in the driving overall variation in electricity demand.
discussion & conclusion
supporting figures & links
To see the full repository, check out the project on GitHub at:
https://github.com/colleenmccamy/tou-analysis
QQ Plot for hourly energy demand residuals.
checkout the code
aug <- energy_temp_df |>add_predictions(model_tou_peak_demand) |>mutate(residuals_energy = hourly_energy_mwh - pred)qqPlot(aug$residuals_energy)
[1] 16604 16605
references
Footnotes
California, State of. 2022. “California Releases World’s First Plan to Achieve Net Zero Carbon Pollution.” California Governor. November 16, 2022. https://www.gov.ca.gov/2022/11/16/california-releases-worlds-first-plan-to-achieve-net-zero-carbon-pollution/. }↩︎
California, State of. 2022. “California Releases World’s First Plan to Achieve Net Zero Carbon Pollution.” California Governor. November 16, 2022. https://www.gov.ca.gov/2022/11/16/california-releases-worlds-first-plan-to-achieve-net-zero-carbon-pollution/. }↩︎
“API Dashboard - U.S. Energy Information Administration (EIA).” n.d. Accessed December 3, 2022. https://www.eia.gov/opendata/browser/electricity/rto/region-sub-ba-data. }↩︎
“Our Company | San Diego Gas & Electric.” n.d. Accessed December 3, 2022. https://www.sdge.com/more-information/our-company. }↩︎
Miller, Norman L., Katharine Hayhoe, Jiming Jin, and Maximilian Auffhammer. 2008. “Climate, Extreme Heat, and Electricity Demand in California.” Journal of Applied Meteorology and Climatology 47 (6): 1834–44. https://doi.org/10.1175/2007JAMC1480.1. }↩︎
US Department of Commerce, NOAA. n.d. “Climate.” NOAA’s National Weather Service. Accessed December 3, 2022. https://www.weather.gov/wrh/Climate?wfo=sgx. }↩︎
Citation
BibTeX citation:
@online{mccamy2022,
author = {Colleen McCamy},
title = {Time-of-Use {Energy} {Analysis}},
date = {2022-12-03},
url = {https://colleenmccamy.github.io/2022-12-03-tou-policy-analysis},
langid = {en}
}